candidate compound
Proposing Novel Extrapolative Compounds by Nested Variational Autoencoders
Osakabe, Yoshihiro, Asahara, Akinori
Materials informatics (MI), which uses artificial intelligence and data analysis techniques to improve the efficiency of materials development, is attracting increasing interest from industry. One of its main applications is the rapid development of new high-performance compounds. Recently, several deep generative models have been proposed to suggest candidate compounds that are expected to satisfy the desired performance. However, they usually have the problem of requiring a large amount of experimental datasets for training to achieve sufficient accuracy. In actual cases, it is often possible to accumulate only about 1000 experimental data at most. Therefore, the authors proposed a deep generative model with nested two variational autoencoders (VAEs). The outer VAE learns the structural features of compounds using large-scale public data, while the inner VAE learns the relationship between the latent variables of the outer VAE and the properties from small-scale experimental data. To generate high performance compounds beyond the range of the training data, the authors also proposed a loss function that amplifies the correlation between a component of latent variables of the inner VAE and material properties. The results indicated that this loss function contributes to improve the probability of generating high-performance candidates. Furthermore, as a result of verification test with an actual customer in chemical industry, it was confirmed that the proposed method is effective in reducing the number of experiments to $1/4$ compared to a conventional method.
How Machine Learning is accelerating Drug Design part1
Abstract: Structure-based drug design uses three-dimensional geometric information of macromolecules, such as proteins or nucleic acids, to identify suitable ligands. Geometric deep learning, an emerging concept of neural-network-based machine learning, has been applied to macromolecular structures. This review provides an overview of the recent applications of geometric deep learning in bioorganic and medicinal chemistry, highlighting its potential for structure-based drug discovery and design. Emphasis is placed on molecular property prediction, ligand binding site and pose prediction, and structure-based de novo molecular design. The current challenges and opportunities are highlighted, and a forecast of the future of geometric deep learning for drug discovery is presented.
Tailoring Molecules for Protein Pockets: a Transformer-based Generative Solution for Structured-based Drug Design
Wu, Kehan, Xia, Yingce, Fan, Yang, Deng, Pan, Liu, Haiguang, Wu, Lijun, Xie, Shufang, Wang, Tong, Qin, Tao, Liu, Tie-Yan
Structure-based drug design is drawing growing attentions in computer-aided drug discovery. Compared with the virtual screening approach where a pre-defined library of compounds are computationally screened, de novo drug design based on the structure of a target protein can provide novel drug candidates. In this paper, we present a generative solution named TamGent (Target-aware molecule generator with Transformer) that can directly generate candidate drugs from scratch for a given target, overcoming the limits imposed by existing compound libraries. Following the Transformer framework (a state-of-the-art framework in deep learning), we design a variant of Transformer encoder to process 3D geometric information of targets and pre-train the Transformer decoder on 10 million compounds from PubChem for candidate drug generation. Systematical evaluation on candidate compounds generated for targets from DrugBank shows that both binding affinity and drugability are largely improved. TamGent outperforms previous baselines in terms of both effectiveness and efficiency. The method is further verified by generating candidate compounds for the SARS-CoV-2 main protease and the oncogenic mutant KRAS G12C. The results show that our method not only re-discovers previously verified drug molecules , but also generates novel molecules with better docking scores, expanding the compound pool and potentially leading to the discovery of novel drugs.
Reinforcement learning-driven de-novo design of anticancer compounds conditioned on biomolecular profiles
Born, Jannis, Manica, Matteo, Oskooei, Ali, Martรญnez, Marรญa Rodrรญguez
With the advent of deep generative models in computational chemistry, in silico anticancer drug design has undergone an unprecedented transformation. While state-of-the-art deep learning approaches have shown potential in generating compounds with desired chemical properties, they entirely overlook the genetic profile and properties of the target disease. In the case of cancer, this is problematic since it is a highly genetic disease in which the biomolecular profile of target cells determines the response to therapy. Here, we introduce the first deep generative model capable of generating anticancer compounds given a target biomolecular profile. Using a reinforcement learning framework, the transcriptomic profile of cancer cells is used as a context in which anticancer molecules are generated and optimized to obtain effective compounds for the given profile. Our molecule generator combines two pretrained variational autoencoders (VAEs) and a multimodal efficacy predictor - the first VAE generates transcriptomic profiles while the second conditional VAE generates novel molecular structures conditioned on the given transcriptomic profile. The efficacy predictor is used to optimize the generated molecules through a reward determined by the predicted IC50 drug sensitivity for the generated molecule and the target profile. We demonstrate how the molecule generation can be biased towards compounds with high inhibitory effect against individual cell lines or specific cancer sites. We verify our approach by investigating candidate drugs generated against specific cancer types and investigate their structural similarity to existing compounds with known efficacy against these cancer types. We envision our approach to transform in silico anticancer drug design by increasing success rates in lead compound discovery via leveraging the biomolecular characteristics of the disease.
Deep Learning for Estimating Synaptic Health of Primary Neuronal Cell Culture
Kormilitzin, Andrey, Yang, Xinyu, Stone, William H., Woffindale, Caroline, Nicholls, Francesca, Ribe, Elena, Nevado-Holgado, Alejo, Buckley, Noel
Understanding the morphological changes of primary neuronal cells induced by chemical compounds is essential for drug discovery. Using the data from a single high-throughput imaging assay, a classification model for predicting the biological activity of candidate compounds was introduced. The image recognition model which is based on deep convolutional neural network (CNN) architecture with residual connections achieved accuracy of 99.6$\%$ on a binary classification task of distinguishing untreated and treated rodent primary neuronal cells with Amyloid-$\beta_{(25-35)}$.
How AI and Genomics Can Treat Epilepsy
Epilepsy is among the most common neurological disorders that affects 65 million people of all ages globally. In the United States, 3.4 million Americans have epilepsy according to the CDC. Epilepsy can interfere with a person's ability to drive a car, play sports, swim, or exercise. It is a non-contagious brain disorder where recurrent, unprovoked seizures occur. Epilepsy may be caused by many factors, including traumatic brain injuries, stroke, loss of oxygen to the brain, brain tumor, parasitic brain infections (malaria, neurocysticercosis from tapeworms), viral infections (Zika, dengue, influenza), bacterial brain infections, neurological diseases, genetic predisposition, and other causes.
How AI and Genomics are used to treat Epilepsy
Epilepsy is among the most common neurological disorders that affects 65 million people of all ages globally. In the United States, 3.4 million Americans have epilepsy according to the CDC. Epilepsy can interfere with a person's ability to drive a car, play sports, swim, or exercise. It is a non-contagious brain disorder where recurrent, unprovoked seizures occur. Epilepsy may be caused by many factors, including traumatic brain injuries, stroke, loss of oxygen to the brain, brain tumor, parasitic brain infections (malaria, neurocysticercosis from tapeworms), viral infections (Zika, dengue, influenza), bacterial brain infections, neurological diseases, genetic predisposition, and other causes.